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AN  UNBIASED  "RATIO*  ESTIMATE 

Max  A.  Bershad 


THE  MODEL  AND  NOTATION 

From  an  infinite  universe  of  elements  a 
simple  random  sample  of  n/2  is  taken.  The 

mean  of  the  variable  of  interest,  y  ,  is 

1 
observed  together  with  the  mean  of  an  auxiliary 

variable,  x  .  A  second  sample  of  n/2  units  is 

1 
taken  and  the  means  y  and  x  for  the  second 

2  _     2 
sample  are  observed.   X,  the  population  mean, 

is   known. 


Let  r 


y  /x   ;      r 

11  2 


y  /x  ; 

2       2 


y     =  iKy  +y   )  ;      x     =  Kx  +x   );    and   r     =     y/x. 

12  12 


THE  ESTIMATE 


An  unbiased  estimate  of  Y  is 


where 


^b     =  *(ybl+yba) 


yh1      =    y  +r  (x-x  ) 

Dl  1        £  1 


and 


to 


yh^     =     y  +r  (X-x  ). 

02  2       12 

Each  of  the  latter  two  forms  is  analogous 


yr  =  (y/x)X  =  y+r(X-x) 


the  regular  ratio  estimate. 

THE  VARIANCE  OF  THE  ESTIMATE 

The  variance  of  the  estimate  is 


o2_+a2 

y  x 


a2  +(Er.  ; 

r     h 
L  h 


2o  (Er  )  +  i(a   -  )2. 
h       rh:^ 


xy 


This  may  also  be  written  as 

+  a2a2   (1+p2  -  ) 


y-xEr. 


h 


x  rh 


V^\ 


h 


where  the  subscript,  h,  is  introduced  to  mean 
a  half  sample,  i.e.,  of  size  n/2. 


In  the  next  section  this  variance  will  be 
compared  with  the  mean  square  error  of  the 
regular  ratio  estimate  and  Quenouille's 
estimate. 


COMPARISON  WITH  THE  REGULAR  RATIO 
ESTIMATE  AND  QUENOUILLE'S  ESTIMATE 

The  regular  ratio  estimate  is: 


yr  =  (y/x)x 


=  ( 


r  +r  _    r  -r   x-x      " 


(x  +x  )/2 

1   2 


Quenouille  '  s-^-  estimate  is  : 


=  I  2r-(r  +r  )/2 

q.    I    12 


r  +r 


r  -r 


(-JT^)X+  ("^)(x  -x  ) 

d  ^      2   1 


(x  +x  )/2 

1   2 


The  new  estimate  may  be  written  as 

r  +r 


^bi%2)/2   ^   ^)x 


+  (ii)(5  -x  ). 


2 


2   1 


From  these  forms  one  would  guess  that  the 

mean  square  errors  of  the  biased  estimates, 

y  and  y  ,  would  not  be  very  different  from 

the  variance  of  y,  . 
b 

To  obtain  the  results  which  follow  we  have 
assumed  that 


5 


*h 


{\-t)/l\   <  1; 


we  have  used  the  Taylor  expansion  through 
terms  of  the  fourth  degree  and  have  retained 


1  Quenouille,  M.H.   "Notes  on  bias  in 
estimation."  Biometrika,  Vol.  hj,    1956, 
pp.  553-360. 


the  terms  of  order  l/n2.     We  find 


THE  ESTIMATOR  OF  THE  VARIANCE 


=     l/n|  V2+V2-2V 

x     y       xy 


+l/n2 


10V4-20pV3V  +8p2V2V2+2V2V2 
x  x  y  x  y       x  y 


RMSE  =     l/n  V2+V2-2V 

-  I  x     y       xy 


+  l/n; 


9V4-l8pV3V  +6p2V2V2+3V2V2 
x  xy  xy       xy 


2e|  ( 6   -5   )  25 

y    x      xj 


RMSE  =     l/n 


V2+V2-2V 
x     y       xy 


+   l/n2  1+V4-8pV3V  +2p2V2V2+2V2V2   . 
(_     x  xy  xyxyj 

For  the  equations   above  and  (without  imply- 
ing a  value  of  0)   neglecting  E(S  -5   )2S     we 


y     x       x 


have 


RMSE   >  V2  >  RMSE 
yr    yb       yq 
but  the  differences  between  the  three,  for  many 
populations,  would  not  be  expected  to  be 
appreciable. 

Assuming  V  =  V  and  setting  e  =  1-p  the 
y    x 

comparisons  are 

V2       V4 
V2   =  -£(2e)  +  -±{k£+8e2) 


RMSE 


E(2e)  +  -2£(6e+6e2)  -  -  E(B  -5  )25  . 

p        ^2  y  x y  x 


A  simple  and  unbiased  estimator  for  the 
variance  of  y,  is  available,  namely 

s2   =  y^-y2+sf/n 
yb 
since  the  expected  value  of  this  estimator  is 


Es2   =     (a2   +Y2)  -  (c2+Y2) 


a 


y 


b 


y 


y    y> 


Probably  a  better  estimator  of  the 
variance  is 

ll2  n' '  2 
S-        =     *n72n72^T^   Z  (ybhi"ybh)2 


^(ybryb2)2- 


It  is  interesting  that  an  estimator  of  the 
square  of  the  bias  of  a  regular  ratio  estimate 
exists.  This  estimator  is 

y,  y,   -  y2  +  s2/n 

since  its  expected  value  is 

E(ybxyb2^2     =     Urh%)2 

=     (or    -V2     =     K 

which  is  the  squared  bias  of  the  regular  ratio 
estimate  based  on  a  half  sample. 

A  better  estimate  is  probably 

1    1   2  n/2 

2  ^/2  n72^1  I     I   (ybhi_ybh)2  "  ^ybryb2)2' 


RMSE    =  —  (2e)  +  —  (4e+2e2) 
n        2 
y  n^ 


THE  VARIANCE  OF  AN  ESTIMATOR  OF  THE  VARIANCE  OF  COLUMN  MEANS 

Max  A.  Bershad 


THE  MODEL 

We  are  given  a  universe  defined  by  the 
following  table  of  the   quantities  X    . . 


Month 

(t) 

Persons 
(i) 

1 

2 

..  t.. 



,.M 

Average 

1 

. 

2 

• 

3 

• 

• 

i 

■ 

..  X+. 

ti 

X  . 

.1 

• 

* 

• 

• 

N   =  00 

• 

Average 

t. 

x 

1 


M 


In  order  to  estimate  the  quantity  -j-j-  Z(X  -X  )2, 

M  t  t.   .. 

a  simple  random  sample  of  n  persons  is  selected 
and  for  each  sample  person  a  reading  is  obtain- 
ed for  every  one  of  the  M  months. 

From  this  sample  we  wish  to  find  an  estimate 
0,    such  that  the 

i M 
Ea  -  sEt(xt.-xJ2- 


If  we  let 


so  that 


then 


Y,  .   =  X..-X  ., 
ti      ti  .iJ 


Y    =  X  -X  , 


i  M 
E0   =  MLtYt 


So  that 
E0 


EY 


t.. 


N 

E(S  Y  VN)2 

ti 
l 


=   [NEY2.  +  N(N-l)EY. .Y. .,]  *  N' 

ti  ti  ti' 

=  EY, .Y, ...  since  N  =  oo 
ti  ti' 


(1) 


Note  that  the  expectation  on  the  left 

side  of  this  equation  is  an  expectation 

over  all  samples  that  result  from  the  sample 

design;  while  the  expectation  on  the  right 

side  refers  to  the  expectation  over  samples 

of  one  month  and  of  one  pair  of  elements  -- 

that  is  EY, .Y, ..  is  a  shorthand  way  of  writing 
ti  ti'  b 

M  N(N-l) 
[Z    Z   Y..Y.  .  .  ]  *  MN(N-l). 

t   1/fci' 

It  is  also  informative  to  note  that 
M-l 


E0 


■rtExtixti. 


EXtkxfk.] 


(1A) 


where  i'  is  not  i,  k'  is  not  k  but  k  may  or 
may  not  be  i. 

THE  ESTIMATOR 

From  equation  (l),  it  is  now  simple  to 

find  an  unbiased  estimator,  namely 

M  n(n-l) 
0  =   CE    E   y,.y,.,]TMn(n-l).    (2) 
t   i^i'   tX  tX 

We  can  also  express  the  estimator  in 

terms  of  X  values  rather  than  Y  values. 

M     n  ,n 

0     =    Z[(S  yti)2  -  E  yt2.]  t  Mn(n-l) 
t     i  i 

M  n 

=     Ztn^2  -  Z  y2.]  -  Mn(n-l) 
t  *      i 


1        M 

M  n-l    t. 


M  n 

-7 r-r  Z  E(x.  .-X  .)2] 

n(n-l)    .   ti   .1 

t  i 


(2A) 


This  is  the  population  variance  of  0  in 
terms  of  functions  of  the  Y  values,  but  it  is 
advantageous  to  express  the  variance  in  terms 
of  X  values  and  correlations  between  X  values. 


THE  VARIANCE  OF  THE  ESTIMATOR 

Since 

M  n(n-l) 
Mn(n-1)0  =  Z.    Z   ytiyti, 
t   i^i* 

and 

M  n(n-l) 
Mn(n-1)E0  =  Z    E   if 

t   iA'   t' 

M  n(n-l) 
Mn(n-1)A0  =  Z    Z   (y^-Y^  )  (yt ±  ,  ~\, ) 
t   i^i' 

M  n(n-l) 
=  Z    Z  A   A 
t   i^i1  yti  yti' 


(3) 


where 


A0  =  0-E0:  A    =  y..-Y.  , 
y,  .    ti  t. 

ti 


so  that 


EA 

y 


=  0  and  EA   A 


ti 


yti  yti' 


=  0, 


by  virtue  of  an  infinite  universe  of  persons. 

Squaring  both  sides  of  (3)  and  taking  the 
expected  value  over  samples,  we  have 
M  n(n-l) 


M2n2(n-l)^  = 


2  2 

70 


2Z    Z   (A   A    )2 
t   i^i1   yti  yti' 


M(M-l)    n(n-l) 
+   2Z  Z(AAAA  ) 

tA1        iA1     yti  yti'   yt'i  yt'i*    ■ 

since  the  expected  value  of  other  types  of 

terms  in  the  square  is  zero, 


M 


=  2n(n-l)  Z[E(A   )2] 


2  i2 


t    yti 

M(M-l) 

+  2n(n-l)   Z   [EA   A    ]2, 
t^t'   yti  yt'i 

by  virtue  of  N  being  infinite. 

So  that,   M       M(M_1} 

iM2n(n-l)O20  =Z(92  )2+   Z  (a       )2.   O) 
t  yti     t^t1  ytiyt'i 


If  we  now  remember  that 

M 

y,  .   =  x,  .-Zx+./M, 
''hi     ni   ti 

X 

and  if  we  can  take 

Xt(i)Xt'(i)      ' 
for  all  t  and  t',  we  will  have 


y 


hi 


=  G2[l_  |  S  +  -  ] 
x    M  h  m2j 


(5) 


and 

a 


yhiyh»i 


rffc 


jh-h'1"  M(Sh+Sh')"^2J 


where  S,  is  the  sum  of  the  terms  in  the  h-th 
h 

row  of  the  following  symmetric  matrix  with 
unit  diagonal, 


1 

P 

l 

p 

2 

P 

3 

PM-! 

p 

1 

1 

P 

1 

P 
2 

PM-2 

p 

2 

P 

l 

1 

P 

1 

• 

P 

3 

P 

2 

P 

1 

1 

• 

• 

• 

• 

• 

P 

1 

PM-i 

Pm-2 

• 

• 

P 

1 

1 

while  S  is  the  sum  of  all  terms  in  the  matrix. 

After  squaring 

a2   and  a       ; 
yti      ytiyt'i 

and  after  summing  the  first  over  M  and  the 

second  over  t  ^  t',  we  find 


|M2n(n-l)o< 


q2   ?  M 

-  -  |  Z  S2  +  T 

**2   M  ,   h 
LM      h 


(7) 


T  is  the  sum  of  the  squares  of  all  the  elements 
of  the  matrix  given  above. 

We  may  call  S  =  M2p   ;   T  =  M2(a2  +p2  ) 

phk   *  " 


and 


In  this  case  the  estimator  is 


ES,2  =  M2  .  M(a2  +p2  ). 
h  h  Ph.   " 


Then  equation  (7)  becomes 


in(n-l)a2  =  o4[a2  -2a2   ].      (8) 
0      X  V   ^1. 


a    is  the  variance  of  all  the  terms  in  the 
phk 

matrix  and  o2   is  the  variance  of  the  row 
ph. 

averages . 


A  NUMERICAL  EXAMPLE 


Suppose  X  .  is  a  0,1  variate  of  such  nature 

that  X   =  .04.  Suppose  further  that  the 

sample  size  n  =  300;  that  M  =  5;  and  that  the 

correlations  between  months  are  p  =  .50, 

1 
p  =  .kO,    p   =  .50,  p  =  .20. 

2  3  4 


Then  o2   =  .0656:  a2   =  .0011: 
Phk  Ph. 


a2  =  PQ  =  (.0^)(.96)  and 


2    3^    a  ^  -2a2 

n  I^lT  [  Phk   Ph>| 


leads  to 


°a  =  ^(ffU96)  (/-oes-^.oon) 


i  ^6  x  10"6. 

If  0  itself  is  large  compared  to  this  figure 
the  sample  will  yield  a  useful  estimate  of  0. 

SAMPLE  DESIGN  WITHOUT  IDENTICALS 

From  equation  (lA)  it  is  clear  that  a  sample 
design  can  be  used  in  which  n  different  elements 
are  observed  in  each  of  the  M  months  instead  of 
observing  the  same  n  elements  every  month. 


■  n[f\.-\.y 


M-1 
M 

1 
n(n- 

t  1 

X,  .-X 

tl   t 

t 

or 

in 

an  alternate  form 

M-1 
M^n-l' 

M  n(n-l) 

0 

= 

-  £ 

)  t 

E   X 
i^i'   t 

ixu. 

1 

M(M- 

1)   n2 

(9) 


E   X^.X, 


(10) 


M-n2  ¥fcI  rM,  -tr-f  i 

whose  expected  value  is  clearly  correct  by 
referring  the  expected  value  to  equation  (lA) 
M  n(n-l) 


A0  = 


M-1 


M^Cn-l)  t  i^i' 

-   M(M-l)  n'c 


E   E   AXX.AX,  . 
ti   ti 


^ti^fi. 


M2n2  t^t'  i^i 
where  A0  represents  0-E0  and  AX  .  represents 

X..-X.  • 

ti  t. 


a2  =  E(A0)2  =  2 


M-1 


-M^Cn-l). 


M  n(n-l) 
^E  E   E 
t  ±4V 


+    2 


LM2n2j 


(AXt.AXt.t)2 

M(M-l)  n2 
2E   E    E  (AX..AX,  ,  .  ,)' 
t^t<  i^i'    tX  t  X  ' 


since  the  expected  value  of  all  other  terms 
is  zero.  So  that 


2 


=  2 


M 

-1 

2  M 

E  n(n-l) 
t 

M2n(n-l)J 

+  2 

-,  m(m-i) 

1  2E   n2a2 
-M2n2J  «*' 

(M- 

l)»  » 

4 

a__ 

vt(i) 


t(i)  xt'(i) 


M4n(n-1)  t  At(i) 


M(M-l) 


+  2 


M4n2   t^t'   At'(i)  At»(i) 


(11) 


=   2 


M4n(n-1)  t  Xt(i) 


+  2 


M4n2  L 


(HJ-^J 


(12) 


Therefore  the  dimension  of  of  is 

0 

,   (M-l)2  ..  .   2 


M  + 


L  M4n(n-1) 


M4n2 


-(M2-M) 


2     1  f  n  1  )$fo=i 
x  Ln(n-l)  M  \  M  /  Mn  . 


(13) 


Mn-1 
Except  for  a  factor  of  -rj —  this  result 

Mn 

can  he  obtained  as  a  special  case  of  the  sample 

design  for  identicals  by  substituting 

px  =  p2=  '••  =  PM-x  =° 
in  equation  (8).  For  this  special  case 


Phk 


1 


and 


ph. 


=  0. 


An  interesting  question  is  whether  the 
sample  design  with  identicals  must  always  be 
better  than  the  one  with  non- identicals.   In 
the  illustrative  example 


■  2a' 


Jhk 


Ph. 


=   .063^ 


while 


^-S 


1    k       1* 

-  .  -  =  .16, 


so  that  "identicals"  are  better.  But  the 
broader  question  to  be  investigated  is  what 
restraints  are  imposed  on  the  correlation 
coefficients  by  the  assumption  of  stationarity, 
i.e.  that 

a 


xtVe 


P  a2 
e  x 


for  all  t. 


In  the  meantime  one  can  make  some 
progress  by  considering  not  the  symmetrical 
matrix  with  unit  diagonal  but  the  same  matrix 
with  the  unit  diagonal  removed.  That  is,  for 
example,  to  consider 


in  place  of 


1  P 


P  1 

1 

P  P 
2  1 


Utilizing  equation  (7),  we  find  that 
equation  (8)  may  be  expressed  in  terms  of 

p'   ,  a2,   ,  and  a2, 

•  •     P  VI  P  V 

hit        h. 
where  the  primes  on  the  correlations  indi- 
cate that  the  parameters  refer  to  the  revised 
matrix  of  correlation  coefficients  with  the 
l's  removed. 

More  specifically,  it  is  found  that  for 
the  identical  design,  equation  (8)  becomes 


2(M-1)   a0 


[(1-p1   )2  +  Ma2, 


hk 


2(M-l)a2   J. 
P  h. 


(IV) 


Since 


P' 


hk 


P' 


h. 


M-l 


[l+(M-2)5] 


where  &  is  the  intra-class  correlation 
coefficient  when  the  row  of  the  revised 
matrix  is  regarded  as  a  cluster, 


2(M-1) 


=  <[(l-p<   )2 

-A.  •  • 

+  a2,   (M-2)(1-2S)]. 


P1 


(15) 


hk 


The  dimension  of  a?  for  the  "identical" 
0 

design  will  be  smaller  than  that  for  "non- 
identical"  if  the  bracket  on  the  right  side 
of  equation  (15)  is  less  than  one  (except  for 
a  negligible  factor  of   "  ) .  And  this  will 
certainly  be  true  if  5  >  ■§■« 


SOME  PROPERTIES  OF  THE  1960  CENSUS  INTEGER  WEIGHTING  METHOD 

Joseph  F.  Daly 


The  ratio  estimation  procedure  for  the 
i960  census  of  population  and  housing  assigned 
integral  weights  to  the  individual  sample 
cases  in  a  stratum  in  such  a  way  that  the  sum 
of  the  weights  was  equal  to  the  100  percent 
count  for  that  stratum.  More  generally,  since 
the  estimation  was  done  at  the  tract  level 
while  the  data  was  processed  by  enumeration 
district,  it  was  necessary  to  devise  a  proce- 
dure for  creating  the  integer  "target  numbers" 
at  the  enumeration  district  level.  The  prob- 
lem can  be  stated  thus: 


Since 


. . ,  nn  whos  e 

K 


"Given  a  set  of  integers  n  , 

1 

sum  is  n,  find  a  set  of  integers  N  ,  ....  Nn 
7  x       k 

whose  sum  is  a  prescribed  N  in  such  a  way  that 
the  N.  are  essentially  proportional  to  the  n.." 

We  use  the  following  procedure: 

Select  an  integer  r  a- 
o 

the  set  0,1,  . . .,  n-1. 

Add  r  to  the  product  I 

the  sum  by  n,  obtaining  a  quotient  N 


l)  Select  an  integer  r  at  random  from 

o 


2)  Add  r  to  the  product  Nn  and  divide 


and  a  remainder  r  . 
1 
3)  Add  r  to  the  product  Nn  and  divide 
1  2 

by  n,  obtaining  a  quotient  N  and  a 

2 
remainder  r  ,  and  so  on. 
2 

This  process  gives  rise  to  the  relations 


r  +Nn        =     N  n+r 
o        1  11 


r  +Nn 
1        2 


=     N  n+r 
2       2 


A  < 


(0  <  r.    <  n) 
—     1 


v,  r       +Nn 
k-i       k 


To  show  that 


Nn  n+r 

k       k 


ZN.      =     N, 

1 


we  first  sum  the  above  equations  to  get 

NZn.   =  nEN.+(r  -r  ). 
1        1   k  o 


Zn.      = 

1 

see  that 

r,  -r 

k     0 

mus 


t  be  divisible  by  n.   But  since  both  r  and 


r  are  in  the  range  (0, n-l),  we  must  have 


so  that 


Hence 


r  -r   <  n 

k  o 


r.  -r    =  0. 
k  o 


Nn 


nZN., 

1 


Next  we  note  that  distinct  choices  of  r 

give  rise  to  distinct  values  of  r  .   For  if  we 

B  1 

had 


r  +Nn 
o   1 


An+r 


and 


we  would  have 


so  that 


r'+Nn   -     Bn+r 
o   1        1 


r  -r' 

o  o 


A-B  n, 


r  -r' 

1  o  o 

would  be  divisible  by  n.  Therefore 

r  =  r' 
o    o 

as  above.   Thus  as  r  ranges  over  the  set 

(0,1,  ...,  n-l)  so  also  does  r  (and  so,  by  the 

same  argument,  do  r  ,  . . . ,  r  ) . 

2  K 

Suppose  now  that  r  is  a  chance  variable 
that  takes  the  values  0,  1,  ...,  n-l  with 
equal  probability.   Then  r  ,  ...,  rk  are  also 
equidistributed  over  (0,  ...,  n-l).   Under 
these  conditions  we  have 

Er.   =  Er.(i,,j  =0,  ...,  k) . 

The  relations  (A)  then  imply 

Nn. 
EN.   =  — -   . 


8 


Moreover,  suppose 


Then 


Nn.   =  An+r. 

1 


r.   +Nn.   =  An+(r+r.   ). 
1-1   l  1-1 


We  therefore  have 


N.   =  A  = 
i 


N.   =  A  +  1  = 

i 


Nn. 


Nn.n 

i 


r.  +r  <  n 

i-i 


+  1    r.   +r  >  n 
i-i   — 


Consequently,  if  r  is  uniformly  distributed 


over  (0,  1,  ...,  n-l),we  see  that 

=  1-r/n 


rNn. 


rNn.. 


+  i 


?/n. 


And  the  variance  introduced  by  the  randomizing 
process  is 


N. 

l 


=  &!-?)<*■ 


n    n 


Suppose  N.  is  an  integer  chance  variable 
whose  expected  value  is 
Nn. 


and  suppose  N.  has  positive  probability  of 
taking  on  at  least  one  value  other  than 


+  1. 


~Nn.~ 

i 

-   n  _ 

and 

rNn." 
i 

-   n   _ 

Then  it  is  easy  to  see  that 


a%  >   aN 

N.     i 
l 


Hence  the  distribution  of  N.  is  independent 

of  the  order  in  which  n  ,    . . . ,    n  are  arranged. 

i       k 


MODEL  FOR  A  BINOMIAL  POPULATION  HAVING  SPECIFIED 
CORRELATIONS  OVER  TIME 

Margaret  Gurney 


THE  PROBLEM 

In  studying  a  characteristic  like  unemploy- 
ment over  a  period  of  time,  it  may  be  useful  to 
construct  a  population  with  the  following 
properties : 

(a)  For  each  month  the  proportion  of 
elements  having  the  characteristic 
is  P. 

(b)  The  correlation  between  elements 
for  any  two  consecutive  months  is 

p  >  0. 

i  — 

(c)  The  correlation  between  elements 
for  any  two  months  which  are  m 
months  apart  (e.g.  between  the 
second  and  the  m+2   month)  is 

p  >  0. 

m  — 

The  conditions  under  which  it  is  possible 
to  construct  such  a  population  are  the  subject 
of  this  memorandum. 

TWO  MONTHS 

Given  a  (0,l)  population  consisting  of  N 
elements  {x.},  with  P  the  proportion  of  "l's" 
(e.g.,  the  persons  having  a  desired  character- 
istic) at  the  first  month;  at  the  succeeding 
month  let  the  corresponding  N  elements  (e.g., 
persons)  be  represented  by  {y.},  and  let  the 
corresponding  number  of  "l's"  be  P  also,  and 
let  the  correlation  between  elements  for  the 
two  months  be  p  . 


and  since 


Then 


1 


N 


N  I   Xiyi"Xy 

0  a 
x  y 


y 


P,  and  a2  =  a2  =   PQ, 
x    y 


we  have 


1  N 

NZxiyi   =  P2+ 
i 


(1) 


That  is,  the  proportion  of  elements  which 
are  "1"  in  both  months  is  determined  by 
equation  (l). 

Over  two  months  there  are  four  possible 
patterns  for  elements  in  the  population, 
with  frequencies  {f.),  i  =  1, ...,h: 


Pattern 
Number 

X. 

i 

yi 

Frequency 

1 

1 

1 

f 
i 

2 

1 

0 

f 

2 

3 

0 

1 

f 
3 

k 

0 

0 

f 
4 

which  are  determined  by 


4 

Z  f. 

1 
1 

= 

1 

> 

'  +f 

1   2 

= 

P 

'  +f 
1   3 

= 

P 

f 

1 

= 

P2+PQp 
i 

J 

(2) 


These  four  equations  can  be  solved  uniquely 
to  determine  (f.},  and  we  find 


P2+PQp 


f        =     PQ(l-p    ) 

3  1 

f       =     Q2+PQp 


(3) 


10 


THREE  MONTHS  PATTERN 

For  a  three  month  period,  with  p  the 

2 

correlation  between  the  first  and  third  months 
and  the  patterns  shown  in  table  1,  we  have  -the 
following  conditions  which  must  be  satisfied 
by  the  eight  frequencies  corresponding  to  the 
eight  (equals  23)  possible  patterns;  each  fre- 
quency must,  of  course,  be  a  non-negative 
fraction  between  C  and  1.— 


8 

£  f, 


f  +f  +f  +f 

12   3   4 

f  +f  +f  +f 
12   5   6 

f  +f  +f  +f 
1   3   5   7 

f  +f 
1   2 

f  +f 
1   5 

f  +f 
1   3 


=  1 

=  P 

=  P 

=  P 


P2+PQp 


=  P2+PQp 


P2+PQp 


W 


These  seven  equations  are  used  to  solve  for 
the  eight  unknowns  {f.};  there  is,  therefore, 
some  freedom  in  the  determination  of  the  fre- 
quencies of  the  various  patterns.   In  the  last 
equation 

f  +f   =  P2+PQp 

13  2 

the  maximum  value  of  f  will  occur  when  f  =0; 

i  3 

if  we  fix  f  =0  the  system  will  be  completely 

3 

determined,  subject  to  the  certain  conditions 

which  will  be  discussed  below.   The  values  of 

{f.}  corresponding  to  a  maximum  value  of  f 

1  i 

at  the  third  month  are  shown  in  column  h 

of  table  1. 

Another  relationship  which  can  be  obtained 
from  table  1  is 


1  These  equations  may  be  verified  by  consid- 
ering the  locations  of  the  "l's"  in  column  2 
of  table  1. 


f  +f   =  P2+Q2-PQ(l-2p  -p  ), 

18  12 


which  shows  that  the  maximum  value  of  f  is 

l 
associated  with  a  minimum  value  of  f  . 

8 

Entries  for  the  minimum  value  of  f  (in 

l 
column  5)  may  be  obtained  by  symmetry,  by 

interchanging  P  and  Q,  and  replacing  f.  by 

f 
9-i 

The  last  column  of  table  1  is  obtained 
by  averaging  the  two  preceding  columns  and 
shows  an  intermediate  set  of  frequencies. 

Looking  at  the  table,  we  find  certain 
conditions  which  must  be  met  if  the  frequen- 
cies are  to  be  all  non-negative. 

(a)   For  Maximum  f  (column  h)   we  must 


have 


(i) 

(ii) 

(iii) 


1  >  p  >  p  >  0 
—  1  —  2  — 

1-p  >  p  -p  >  0 

1  —   1   2  — 


(5) 


-P(l-2p  )  >  0  or  p  >  1- 


2P 


(b)   For  Minimum  f  (column  5)  conditions 
l 
(i)  and  (ii)  must  be  satisfied,  and  also 


( 


iv) 


p  >  1- 

i  — 


2Q 


Max.f  +  Min.f 

(c)  For  Average  f  =  ^- - 

i         2 

conditions  (i)  and  (ii)  are  necessary;  and 


also  both 

2p  +p  >  >  i 
l   2  —    r 

W) 

and 

(vi) 

2p  +p  >  5-  I 

1   2  ~     Q 

(6) 


The  frequencies  in  column  h   (Maximum  f  ) 

i 
would  be  suitable  when  P  is  small,  since  then 

condition  (iii)  will  always  be  satisfied  when 

P  is  less  than  .5;  analogously,  column  5 

(Minimum  f  )  should  be  used  if  P  is  large. 

l 
For  P  close  to  one  half,  column  6  provides  a 

symmetric  set  of  frequencies,  with 


f  =  f  ,  f  =  f  = 

=  f   =  f  , 

and  f  =  f  . 

18    2     4 

5     7 

3      6 

11 


Table  1. --Patterns  and  Frequencies  for  Three  Months 


Pattern 

Frequency 

Number 
(1) 

Form 
(2) 

f . 
(3) 

Maximum  f 

Minimum  f 
(5) 

Average  f 
(6) 

1 
2 

3 
k 

5 
6 

7 
8 

111 
110 
10   1 
10   0 
Oil 
0   10 
0   0   1 
0  0   0 

f 

1 

f 

2 

f 

3 

f 

4 

f 

5 

f 

6 

f 

7 

f 

8 

P2+PQp 

2 

PQ(p  -P  ) 

1        2 

0 

PQ(l-P  ) 

i 

PQ(p  -p  ) 

1        2 

PQ(l-2p  +p    ) 

1        2 

pqU-p  ) 

i 

Q2-PQ(l-2p   ) 
l 

P2-PQ(l-2p   ) 

l 

pqU-p  ) 

1 
PQ(l-2p  +p   ) 

1        2 

PQ(p  -P  ) 

1        2 

pqU-p  ) 

1 
0 

PQ(p  -p) 

1        2 

Q2+PQp 

2 

P2^(-l+2p   +p    ) 

2                        12 
— (1"P2) 

^(l-2p  +  p   ) 
2                  12 

-p-U-P  J 

^                2 

—  (i-pa) 

^(l-2p  +P  ) 

2                    12 

— d-P2) 

Q2+^(-l+2p  +p    ) 
2                  12 

FOUR  MONTHS 

For  four  months  there  are  24  =  l6  possible 
patterns.  However,  only  11  equations  are 
found  similar  to  equations (h)   for  determining 
the  l6  frequencies  {f.}.   As  in  the  case  of 
three  months,  there  is  freedom  in  determining 
(perhaps  as  many  as  five  of)  the  {f.l.  One 
possible  set  of  frequencies  may  be  obtained 
as  follows : 

(a)  Starting  with  the  pattern  for 

Maximum  f  at  month  3,  let  us 

i 
assume  that  f  has  its  maximum 

i 
possible  value  at  month  h: 

f   =  P2+PQp 

1  3 

then  from  the  new  equation  (see 
the  distribution  of  "l's"  in 
column  1  of  table  2) 
f  +f  +f  +f   =  P2+PQp 

13   5   7  3 


we  have 


(b)  From  the  fact  that  f  and  f 

1         2 

for  the  ^--month  pattern  must 

add  to  the  f  for  3  months,  and 

l 
from  similar  relationships  for  f 

3 

and  f  ,  f  and  f  ,  etc.,  we  can 

4     5         6 

now  determine  f  ,  f  ,  f  ,  and  f  . 

2     4     6  8 

(c)  Finally,  there  is  a  symmetry  of  form 
in  the  patterns  which  is  associated 
with  equal  frequencies:   e.g., 
f(lll0)  =  f(011l),  or  f  =  f  ;  this 

2      9 

happens  because  mathematically  there 
is  no  reason  why  the  order  of  the 
columns  should  not  be  reversed,  with 
the  first  column  representing  the 
most  recent  month.  We  find,  also, 

f  . 


f  =  f   ,  f  =  f  ,  and  f 

4      13    6      11         8 


15 


f   =  f   =  f   =  0. 

3      5      7 


We  may  now  obtain  the  remaining  f.'s 

by  using  the  relationships  of  two 

consecutive  f.'s  for  h   months  to  a 
i 

corresponding  frequency  from  the 
3-month  pattern. 


12 


Table  2. --Comparison  of  ^-month  Patterns,  for  Maximum  f  ,  with 
Patterns  for  3-,  2-,  and  1-Month  x 


Pattern 

h  months 

3  months 

2  months 

1  month 

(1) 

T2T 

(3) 

IM 

(5) 

16) 

(7) 

~mjK 

1111 

f 

P2  +  PQp 

1 

i 

3 

f 

P2  +  PQp 

V 

1110 

f 

2 

PQ(p  -P  ) 

2       3 

J        a 

2 

1101 

f 

0 

1 

)  f 

P2  +  PQp 

3 

f 

.PQ(p  -P  ) 

1 

i 

\ 

1100 

f 
4 

PQ(p   "P   ) 

1        2 

J       2 

1        2 

1011 

f 

5 

0 

u 

0 

i 

P 

1010 

f 
S 

0 

J  3 

►f 

PQ(l-P  ) 

J 

1001 

f 
7 

0 

>  f 

PQ(i-p  ) 

2 

i 

1000 

f 
8 

pqU-p  ) 

1 

4 

i 

y 

0111 

f 
9 

PQ(p  -p  ) 

2        3 

u 

PQ(p  -P  ) 

} 

0110 

f 

io 

PQ(p   -2p  +p   ) 

1           2       3 

I  5 

1        2 

f 

pqU-p  ) 

0101 

f 
11 

0 

'. 

PQ(l-2p  +p   ) 

3 

i 

\ 

0100 

f 
12 

PQ(l-2p  +p    ) 

1       2 

J      6 

1        2 

/ 

0011 

f 
13 

PQ(p  -P  ) 

1       2 

u 

PQ(l-P  ) 

\ 

f 

2 

Q 

0010 

f 
14 

PQ(l-2p  +p   ) 

1        2 

J  7 

1 

}   f 

Q2  +  PQp 

• 

0001 

f 

15 

PQ(1-P  ) 
i 

U 

Q2-PQ(l-2p   ) 

4 

i 

0000 

f 
16 

Q2-PQ(2-Jp   ) 

l 

I  8 

l 

' 

Table  2  shows  a  comparison  of  the 
frequencies  for  k   months,  developed  according 
to  the  discussion  above,  with  the  frequencies 
for  3j  2,  or  1  months,  when  we  make  f  as 
large  as  possible  at  each  month. 

In  order  that  all  the  l6  frequencies  for 
h   months  be  non-negative,  we  must  have  the 
following  conditions : 

(i)  1  >  p  >  p  >  p       (7) 

—   1  —   2  —   3 


(ii)   1-p   >  p  -p   >  p  -p 

1  —   1   2  —   2   3 


:  -  PQ(2-3P  )  >  0, 

or   p  >  1 ^  . 


(iii)   Q' 


(7) 


We  may  obtain  frequency  tables  also  for 
the  case  where  f  is  as  small  as  possible; 


or  for  a  weighted  average  of  the  tables  for 

Maximum  f  and  Minimum  f  ,  with  appropriate 
l  i 

changes  in  condition  (iii)  above. 
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Table  3. 


-Minimum  Value  of  p  for  Various  {P,  m)  and  Maximum  f 
1  1 


p 

Number  of  months  ( m) 

k 

5 

7 

9 

11 

16 

21 

51 

.50 
.33 
.25 

.20 
.10 
.05 
.02 

•  533 
0 
0 
0 
0 
0 
0 

.500 
.250 

0 
0 
0 
0 
0 

.667 
.500 
.355 

.167 

0 
0 
0 

.750 
.625 
.500 

.375 

0 
0 
0 

.800 

.733 
.600 
.500 

0 
0 

0 

.933 

.800 
.750 
.667 
.333 

0 
0 

.950 
.850 
.800 
.750 
.500 

0 
0 

.960 
.9^0 
.920 
.900 
.800 
.600 

0 

GENERALIZATION 

A  format  for  generalization  to  more  than 

h   months  may  now  be  obtained.   Considering 

only  the  frequencies  associated  with  maximum 

values  of  f  at  each  month,  we  have  the 

1 
following  conditions  for  a  pattern  over  »m" 

months . 

(i)   1  >  p  >p  >  ...  >  p 

~"  1  ~     2  —  —  m-i 


( ii)   1-p  >  p  -p  > 
1  ~     1   2  — 


>  P   -P 
-  m-2  Mm-i 


That  is,  the  p.  are  monotonically 
decreasing,  as  are  also  the  first 
differences 


p. -p. 

1   1+1 


(8) 


Also 

(iii)   Q2-PQ(^T2 


m-1  p  )  >  0 
1  — 


or 


\  >  !-  7^T)P  • 
For  the  Minimum  f   condition  (iii)  will  be 
replaced  by 


(iv) 


p  >  1- 

1  ~ 


Hi' 


37 


and  for  a  weighted  average  of  Maximum  f  and 

Minimum  f  ,  another  condition  ( v)  must  "  be 

developed,  which  will  involve  p    ,  as  well 

m-i 
as  p  . 

1 

It  is  evident  from  equations  (8)  that  the 

feasibility  of  finding  possible  frequencies 

depends  on  the  proportion  P,  on  the  correlation 

{pi},  and  on  the  number  of  months  in  the 

pattern.   Table  3  shows  the  limits  on  p  for 

a  pattern  based  on  Maximum  f  . 

1 

If  we  use  the  Minimum  f  ,  the  table  would 
be  of  the  same  form  with  P  replaced  by  Q. 

Mote  that  for  P  close  to  .50  the  limitation 

on  the  value  of  p  becomes  quite  restrictive 

after  a  small  number  of  months.   For  a  P  in 

this  neighborhood  a  somewhat  better  situation 

will  be  obtained  if  we  replace  Maximum  f  by  an 

average  of  Maximum  f  and  Minimum  f  . 

1  1 
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Table  h. — Comparison  of  Patterns  for  Six  and  Five  Months,  for  Maximum  f 


6  months 

Pattern 

5  months 

Pattern 

f . 

1 

Frequency 

f . 
i 

Frequency 

(1) 

(2) 

(3) 

w 

(5) 

(6) 

111111 

111110 

f 

l 

f 

2 

P2  +  PQp      i 

} 

PQ(p  -P  )     J 

4   5 

11111 

f 

i 

P2  +  PQp 

4 

111100 

f 

4 

PQ(p  -p  ) 

3   4 

11110 

f 

2 

PQ(p  -p  ) 

3   4 

111000 

f 
8 

PQ(p  -p  ) 

2   3 

11100 

f 
4 

PQ(p  -P  ) 

2   3 

110000 

f 
16 

PQ(p  -P  ) 

1   2 

11000 

f 
8 

PQ(p  -P  ) 

1   2 

100000 

f 
32 

PQ(l-P  ) 

i 

10000 

f 
16 

pqU-p  ) 

1 

011111 
011110 

f 
33 

f 
34 

PQ(p  -P  )     i 

4   5         1 

PQ(p  -2p  +p  )   J 

3     4   5 

01111 

f 
17 

PQ(p  -p  ) 

3   4 

011100 

f 
36 

PQ(p  -2p  +p  ) 

2    3   4 

OHIO 

f 
18 

PQ(p  -2p  +p  ) 

2    3   4 

011000 

f 
40 

PQ(p  -2p  +p  ) 

1     2   3 

01100 

f 
20 

PQ(p  -2p  +p  ) 

1     2   3 

010000 

f 
48 

PQ(l-2p  +p  ) 

1   2 

01000 

f 
24 

PQ(l-2p  +p  ) 

1   2 

001111 
001110 

f 
49 

f 
50 

PQ(p  -p  )     i 

3   4          1 

PQ(p  -2p  +p  ) 

2    3   4 

00111 

f 
25 

PQ(p  -P  ) 

2   3 

001100 

f 
52 

PQ(p  -2p  +p  ) 

1     2   3 

00110 

f 
26 

PQ(p  -2p  +p  ) 

1     2   3 

001000 

f 
56 

PQ(l-2p  +p  ) 

1   2 

00100 

f 
28 

PQ(l-2p  +p  ) 

1   2 

000111 
000110 

f 
57 

f 
58 

PQ(p  -p  )     i 

2   3          I 

PQ(p  -2p  +p  )  J 

1     2   3 

00011 

f 
29 

PQ(p  -P  ) 

1   2 

000100 

f60 

PQ(l-2p  +p  ) 

1   2 

00010 

f 
30 

PQ(l-2p  +p  ) 

1   2 

000011 
000010 

f 
61 

f 
62 

PQ(p  -P  )     i 

1   2 

PQ(l-2p  +p  )   ^ 

1   2 

00001 

f 
31 

PQ(l-P  ) 

i 

000001 
000000 

f 
63 

f 
64 

pqU-p  )     i 

\ 

Q2  -  PQ(^-5p  )  J 

i 

00000 

f 
32 

Q2  -  PQ(3-^p  ) 

i 

Table  h   above  shows  the  frequencies  for  six  and  five  months  for  Maximum  f  ; 
and  indicates  the  manner  in  which  frequencies  can  be  obtained  for  more   x 
than  six  months.  The  patterns  are  listed  in  decreasing  binary  order;  the 
frequencies  for  all  patterns  which  are  not  shown  are  zero. 

Remarks :   (l)  The  number  of  patterns  at  month  "m"  is  equal  to  2  . 

(2)   From  table  3,  it  can  be  seen  that  frequencies  for  a  7-month 
model  (using  Maximum  f  )  could  be  found,  for  P=.5,  only  if  p 
is  at  least  equal  to  1  .667.   For  P=.l,  however,  a  solution1 
would  always  be  possible. 
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ILLUSTRATION  -  UNEMPLOYED  PERSONS 

As  an  illustration,  consider  the  problem 
of  setting  up  a  universe  to  represent  the  pro- 
portion of  persons  in  the  labor  force  who  are 
unemployed.   For  such  a  population  over  a 

5-month  period  the  following  values  of  the 

2/ 
parameters  may  be  assumed;— 

P   =   .05 

P   -   .50 

1 

p   =  .4o  (9) 

2 

P   =  .30 
3 

P   =  .25 

4 

There  are  32  possible  patterns  for  a  particular 
element:  A  "1"  means  that  the  person  is  unem- 
ployed during  the  month  corresponding  to  the 
column  in  which  the  "1"  occurs,  while  a  "0" 
means  that  he  is  employed.   For  example,  the 
pattern  11100  is  associated  with  a  person  who 
was  unemployed  during  the  first  3  months,  and 
employed  during  the  last  two. 

Table  5  shows,  for  Maximum  f  and  the 

1 
parameters  assumed  above,  the  frequencies  of 

the  various  possible  patterns.   It  may  be  noted 

that,  using  Maximum  f  ,  a  person  who  is  unem- 

1 
ployed  at  some  time  during  the  period  may 

become  employed,  but  the  patterns  in  which  he 
might  become  unemployed  again  have  their  fre- 
quencies equal  to  zero;  that  is,  for  example, 


Table  5. — Model  for  Unemployment  Over  a 
5-month  Period 


These  are  reasonable  values  of  {p  }  if  an 

unbiased  estimate  is  to  be  used.  If  1   we  use  a 

composite  estimate  of  the  form  employed  in  the 

Current  Population  Survey,  we  may  wish  to  take 

p  =  .42,  p  =  .26,  p  =  .14  and  p  =  .07. 


Pattern 

Frequency 

Number 
(1) 

Form 
(2) 

(3) 

(4) 

1 

11111 

f 
1 

.014575 

2 

11110 

f 
2 

.002375 

3 

11101 

f 
3 

0 

4 

11100 

f 
4 

.004750 

5-7 

-- 

f  -f 
5  7 

0 

8 

11000 

f 

8 

.00^750 

9-15 

-- 

f  -f 
9   15 

0 

16 

10000 

f 
16 

.023750 

17 

01111 

f 
17 

.002375 

18 

OHIO 

f 

18 

.002375 

19 

01101 

f 
19 

0 

20 

01100 

f 
20 

0 

21-25 

-- 

f   -f 
21    23 

0 

24 

01000 

f 
24 

.019000 

25 

00111 

f 
25 

.004750 

26 

00110 

f 
26 

0 

27 

00101 

f 
27 

0 

28 

00100 

f 
28 

.019000 

29 

00011 

f 
29 

.004750 

30 

00010 

f 
30 

.019000 

31 

00001 

f 
31 

.023750 

32 

00000 

f 
32 

.855000 
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a  pattern  such  as  11011  or  00101  has  its 

"f"  =  0.   The  use  of  Minimum  f  would  similar- 

l 
ly  restrict  other  patterns  to  having  their 

frequencies  equal  to  zero. 

As  a  check,  notice  that  in  table  5  the  sum 
of  the  frequencies  of  patterns  having  "1"  in 
column  1  is  equal  to  P  (=.05)  as  is  also  the 
sum  of  patterns  having  "1"  in  column  2  (or  J, 
or  h,    or  5) .  Similarly,  the  sum  of  the  fre- 
quencies of  patterns  having  "1"  in  both  columns 
1  and  5  is  .02625,  which  is  equal  to 

P2  +  PQp  • 

l 

Other  similar  computations  show  that  the  model 
fits  all  the  conditions  (similar  to  equations 
( h) )  imposed  by  the  values  of  P  and  {p.}. 

It  is  interesting  to  see  that,  for  the 
parameters  of  equations  (9jj>  l-^  percent  of 
the  persons  are  unemployed  during  all  5  months, 
while  the  average  number  of  persons  unemployed 
is  5  percent;  on  the  other  hand  85.5  percent 
are  employed  during  all  5  months,  out  of  the 
95  percent  who  are  employed  in  each  month. 

EXTENSIONS 

It  is  not  necessary  that  P  or  {p.}  be  the 
same  for  each  month  or  combination  of  months. 
For  example,  for  3  months  we  might  have 
proportions  P  ,  P  ,  and  P  ,  and 

12  3 


Equations  ( h)   would  then  be  replaced  by 

8 

Z   f.  =     1 
l 

l 

f+f+f+f  =P 

12             3             4  1 

f+f+f+f  =       P 

12  5             6  2 

f+f+f+f  =P 

13  5             7  3 

f+f  =     P  P     +   a  a  p,        >. 

1              2  12             1    2    \  1,2J 


(io) 


f+f 

1  5 


P  P     +   a  a   p,         v 

2    3  8    3    \2,  3j 


f+f       =     P  P     +   a  a  p,        xj 

1  3  13  1     3    U,3/J 

Conditions  (5)  would  be  replaced,  for 

Maximum  f  ,  by 

l 


and 


(i)   1  >  P(         x  >  P(         v  >  0; 
~  \X,2J    —       K  1,3-)  — 


1  >  P/      N  >  P(  N   >  0 

—  \2,3J    —       U,3J  ~ 

(ii)   1-p/    v  >  p/    \-p/    N  >  0: 


and 


(1,2)-  (1,2)   ( 1, 3)  — 
( 2,3)  —  (2,3)   ( 1,3)  — 


(11) 


(iii)   er  p,    s  +  cr  p,         ■, 

1  V  1.2'   3  V  2,3/1 

'  '2 


>   -2{P  +P  -1). 
1   3 


There  would  perhaps  be  other  conditions  on 

the  (P.)  and  the  correlations,  in  order  that 

1 

all  the  frequencies  would  be  non-negative. 
Similar  to  table  1,  we  should  have,  for 

Maximum  f  : 


(1*2)  '   (2,3) 
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Table  6. 

--Three 

-month  Model  with  Varying  Proportions  and  Correlations 

Pattern 

Frequency,  for  Maximum  f 

l 

Number 

Form 

f . 
i 

Form 

(1) 

(2) 

(3) 

(V) 

1 

111 

f 
i 

P  P  +  a   a   p 

13      13  1,3 

2 

110 

f 

2 

P  (P  -P  )  +  o  (o  p   -a  p   ) 

12   3       12  1,2   3  1,3 

3 

101 

f 
3 

0 

k 

100 

f 

4 

P  Q  -  a  a  p 

12      12  1,2 

5 

Oil 

f 

5 

P  (P  -P  )  +  a  (a  p   -op   ) 

32   1       3   2  2,3   1  1,3 

6 

010 

f 
6 

QP-P(P-P)-aap    -a(ap   -a  p   ) 

12      3   2   1       12  1,2      3   2  2,3   1  1,3 

7 

001 

f 
7 

P  Q  -  a  a  p 

3  2      2  3  2,3 

8 

000 

f 
8 

Q(Q-P)  +  a(ap   +ap   ) 

2   13       2   1  1,2   3  2,3 

The  extension  to  more  than  three  months 
obviously  leads  to  frequencies  with  more,  and 
perhaps  quite  restrictive,  conditions  on  the 


{P.}  and  the  correlations.   If  there  is  some 
interest  in  a  population  of  this  more  general 
sort,  further  work  can  be  done  on  the  model. 


MODEL  FOR  A  MULTIVARIATE  NORMAL  POPULATION  HAVING 
SPECIFIED  CORRELATIONS  OVER  TIME 

Margaret  Gurney 


SUMMARY 

This  memorandum  indicates  a  way  of 
constructing  a  multivariate  normal  population 
with  specified  correlations  over  time,  using 
a  Monte  Carlo  process.   It  is  assumed  that  the 
population  is  stationary  --  that  is,  the  vari- 
ances are  equal  at  each  time  period,  and  that 
the  correlation  between  observations  i  months 
apart  is  the  same,  whatever  pair  of  months 
is  compared. 

A  UNIVAC  1105  program  has  been  developed 
which  will 

(a)  Construct  such  a  population  over  a 
period  of  7  months, 

(b)  Print  out  seven  Random  Normal  Values 
for  7  successive  months,  for  each 
iteration  of  the  Monte  Carlo  process, 

(c)  Count  the  number  of  increases  in  the 
values  created,  month-to-month,  and 
over  h   months  and  6  months,  for  all 
iterations, 

(d)  Compute  variances  of  the  values,  and 
also  third  and  fourth  moments  (as  a 
test  of  normality),  for  each  month  and 
for  the  average  of  all  7  months. 

PROGRAM  OPTIONS 

The  program  has  been  written  so  as  to 
permit  several  choices  in  the  construction  of 
the  population.  The  input  to  the  program  is 
by  means  of  paper  tape,  on  which  it  is  possi- 
ble to  specify 

(a)  The  mean  at  each  month, 

(b)  The  Standard  Deviation,  which  is 
assumed  to  be  the  same  for  each 
month, 


(c)  The  correlations,  p  ,  p  ,  ...,p  , 

12       6 

(d)  The  number  of  iterations  of  the 
Monte  Carlo  process, 

(e)  The  random  number  used  to  start 
the  process, 

(f)  whether  the  variance  computation 
is  to  be  performed,  or  not. 

Results  of  running  the  program  for  a 
population  simulating  the  estimate  of  unem- 
ployment from  the  Current  Population  Survey 
(CPS)  are  shown  in  tables  A  and  B  of  the 
Appendix. 

(a)  The  correlations  used  are  those 
from  the  composite  estimate  of  the 
CPS,  with  equal  weights  on  the  two 
parts  of  the  estimate.   (K  =  1/2). 

(b)  Four  different  patterns  of  growth 
are  assumed,  with  the  level  of 
unemployment  assumed  to  be  about 
5.5  percent. 

(1)  No  change  over  7  months. 

(2)  Increase  of  l/lOth  percentage 
point  per  month. 

(3)  Increase  of  l/6th  percentage 
point  per  month. 

(k)      Increase  of  l/Hh  percentage 
point  per  month. 

(c)  The  standard  deviation  of  the 
estimate  of  unemployment,  at  the 
level  of  April  1962,    is 

1/710  =  .001416. 
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BACKGROUND 

In  the  "Economic  Report  of  the  President," 
transmitted  to  the  Congress  in  January  1962, 
there  is  a  recommendation  for  "Stand-by 
capital  improvement  authority, "  with  the 
following  reference  to  the  estimate  of  unem- 
ployment from  the  CPS  survey: 

"The  President  would  be  authorized  to 
initiate  the  program  within  two  months  after 
the  seasonally  adjusted  unemployment  rate 

(a)  had  risen  in  at  least  three  out  of 
four  months  (or  in  four  out  of  six 
months )  and 

(b)  had  risen  to  a  level  at  least  one 
percentage  point  higher  than  its 
level  h   months  (or  6  months) 
earlier." 

In  view  of  the  fact  that  the  estimate  of 
unemployment  is  correlated  from  month  to  month, 
and  that  there  is  sampling  error  in  the  esti- 
mate at  each  month,  several  questions  arise, 
such  as : 

(1)  How  frequently  might  the  criterion 
signal  change  when  there  was,  in  fact, 
no  change? 

( 2)  How  frequently  might  the  conditions  of 
the  criterion  be  met  when  there  was  in 
fact  only  a  small  change? 

( 3)  How  frequently  might  the  criterion  fail 
to  detect  a  true  change? 

To  estimate  the  variance  of  the  seasonally 
adjusted  unemployment  rate  directly  is  a  for- 
midable, if  not  an  impossible,  task.  As  an 
alternative,  a  Monte  Carlo  procedure  has  been 
devised  which  simulates  the  behavior  of  the 
estimate  of  the  seasonally  adjusted  unemployment 
rate.   It  is  assumed  that  the  seasonally  ad- 
justed unemployment  estimate  is  approximately 
the  mean  of  a  normal  distribution,  with  known 


variance,  and  with  specified  correlations 
Over  time.  A  mathematical  model  may  be  con- 
structed with  the  same  characteristics,  and 
sucessive  samples  may  be  selected;  the  num- 
ber and  directions  of  changes  may  be  counted 
in  each  sample,  and  a  tally  kept  which  will 
provide  an  estimate  of  the  results  which 
would  be  obtained  if  we  were  indeed  able  to 
get  the  information  directly  from  the 
CPS  sample. 

INTRODUCTION 

The  preceding  memorandum  suggested  a 
method  for  constructing  a  binomial  population 
having  specified  correlations  over  time.  This 
was  done  by  considering  all  possible  permuta- 
tions of  the  digits  (0,l)  in  a  number  con- 
sisting of  seven  digits  (to  correspond  to  a 
period  of  seven  months),  and  of  determining 
an  acceptable  frequency  of  occurrence  for  each 
pattern,  such  that  the  correlations  over  time 
should  have  specified  values  between  0  and  1. 

The  present  memorandum  considers  the 
problem  of  constructing  a  multivariate  normal 
distribution  having  specified  time-correlations. 
A  different  approach  is  employed,  using  a  Monte 
Carlo  procedure.  A  specified  proportion  of  the 
random  normal  deviates  created  for  one  month 
are  copied,  to  get  the  corresponding  deviates 
for  the  next  month;  the  copying  rules  must  be 
such  that  the  desired  correlations  over  time 
will  be  obtained,  and  that  the  population 
continues  to  be  normal  after  copying.  The 
development  of  the  copying  rules  is  a  major 
part  of  this  memorandum. 

The  method  employed  here  requires  certain 
limitations  on  the  correlations,  since  the 
frequency  of  each  copying  pattern  must  be 
non-negative. 
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SPECIFICATIONS  OF  THE  MODEL 

We  wish  to  create  a  sequence  of  correlated 
normal  distributions,  over  a  period  of  several 
months.  The  characteristics  of  the  population 
are: 

(a)  For  each  month  the  mean  is  0. 

(b)  The  variance  at  each  month  is  1. 

(c)  The  correlation  between  elements 

for  any  2  consecutive  months  is  p  . 

l 

(d)  The  correlations  between  elements, 

which  are  i  months  apart  (e.g., 

between  month  m  and  month  m+i)  is  p  . 

i 

Conditions  b,  c,  and  d  are  conditions  for 

the  stationarity  of  the  process;  in  practice 

they  may  not  be  satisfied  over  a  long  period 

of  time,  but  they  may  be  appropriate  for  a 

short  period  of  time  (such  as  7  months). 

Other  populations  may  be  obtained  from 
this  standardized  one  by  linear  transforma- 
tions, so  that  the  means  and  variances  need 
not  be  restricted  to  0  and  1;  the  mean  may 
also  vary  from  month  to  month. 

METHOD  OF  CONSTRUCTION  OF  THE  POPULATION 

(a)  First  construct  a  number  which  is  a 
standardized  random  normal  deviate 
N(0,l).   This  can  be  done  by  con- 
structing pseudo  random  numbers  which 
are  rectangularly  distributed,  and 
adding  a  suitable  number  of  them;  the 
addition  of  12  such  numbers  leads  to 
an  approximately  normal  distribution, 
with  mean  of  "6",  and  unit  variance; 
the  adjustment  to  a  standardized  dis- 
tribution is  made  by  substracting  6. 

(b)  Construct  a  set  of  seven  such 
numbers,  using  the  procedure  above  — 
one  for  each  of  the  7  months  over 
which  we  shall  be  investigating 

the  correlations. 


(c)   Copy  (or  don't  copy)  the  number  from 
1  month  to  the  next  (copying  from  1 
month  to  the  next  will  wipe  out  the 
number  already  computed  for  the 
second  of  the  2  months)  according  to 
a  copying  rule. 

The  probability  with  which  a 
particular  copying  pattern  is  used 
will  be  determined  by  the  correla- 
tions imposed  upon  the  population. 

DEVELOPMENT  OF  COPYING  RULES 

The  next  few  sections  will  develop  the 
copying  rules  for  2  months,  3  months,  k   months, 
etc.  For  h   or  more  months,  some  choice  in  the 
frequency  of  copying  can  be  made,  as  will  be 
discussed  later. 

For  2  months,  a  standardized  bivariate 

normal  population,  with  correlation  p  between 

l 
sample  pairs  can  be  constructed  as  follows: 

(a)  Generate  a  standardized  Random  Normal 
Deviate  (RND);  this  is  the  value  for 
the  first  month. 

(b)  Copy  this  same  RND  with  probability 

p  ,  to  obtain  the  value  for  the 

i 
second  month. 

OR 

(c)  Generate  a  new  RND  with  probability 

1-p  ;  use  this  number  as  the  value 

l 
for  the  second  month;  it  is  indepen- 
dent of  the  number  generated  for  the 
first  month. 

This  procedure  can  be  described  by  a 
"Copying  Rule." 
Let 

"1"  mean  "copy" 
and 

"0"  mean  "select  a  new  RND." 
Then,  the  copying  is  characterized  by  the 
statement  that  "copying  rule  1  has  frequency 
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p  , "  while  "copying  rule  0  has  frequency 

1 
1-p  ."  The  patterns  for  two  months  can  be 

l 
described  by  a  table: 


Pattern 

Month 

Copying 
rule 

Frequency 

number 

1 

2 

1 
2 

X 
X 

X 

y 

1 
0 

f  =  P 

l    l 

f  =  1-p 

2        1 

We  know  that  each  month  has  a  distribution 
which  is  N(o,l),  so  we  have 
cov.(l,2)  =  E(p  x.x+(l-p  )x.y}  =  p  a2  =  p  . 

1  1  1  X      1 

That  is,  the  correlation  condition  is 

satisfied. 

There  is  one  limitation  on  the  correlation 

(p  );  since  the  frequency  of  copying  (or  not 

i 
copying)  cannot  be  negative,  we  must  have 

0  <  p  <  1. 

—  l  — 

THREE  MONTHS 

For  a  pattern  extending  over  3  or  more 
months,  we  limit  ourselves  to  copying  from  the 
preceding  month.   (Copying  from  two  months 
ago,  or  longer  ago,  is  a  procedure  which  could 
be  used;  however,  the  limitation  to  copying 
from  the  month  immediately  preceding  leads  to 
a  simpler  rule,  which  has  fewer  limitations 
on  the  relationships  between  the  correlations.) 

Over  a  3-month  period,  there  are  then  four 
possible  sample  patterns,  and  four  copying 
rules : 


Pattern 

Month 

Copying 
rule 

Frequency 

number 

1 

2 

3 

1 
2 

3 

h 

X 
X 
X 
X 

X 

X 

y 
y 

X 

y 
y 

z 

1  1 
1  0 
0  1 
0  0 

f  =  P 

1  2 

f   =  p  -p 

2  12 

f   -  P  -P 

3  12 

f   -  l-2p  +p 

4  12 

To  show  that  the  frequencies  (f.)  are 
indeed  the  ones  shown  in  the  table  above,  we 
use  the  relations 
cov.(l,3)  =  E{f  x2+(f  +f  )xy+f  xz)  =  p 

12   3       4  2 

cov.(l,2)  =  E{(f  +f  )x2+(f  +f  )xy}  =  p 


cov.(2,3)  =  E{f  x2+f  xy+f  y2+f  yz)  =  p 


(1) 


and 


f+f+f+f        =1 

12  3  4 


(2) 


which  lead   to 


=     P 


f     +   f 

1  2 

f       +    f 
1  3 


(5) 


=    l  -  f 


f  . 

3- 


From  these  equations  it  is  found  that 
f   -  p  ,  f  -  f   =p-p, 

1        2    2     3        12 


and 


l-2p  +p  . 

1   2 


The  If.}  are  uniquely  determined,  since  there 
are  four  equations  for  the  four  unknowns. 

The  condition  that  the  values  {f.)  be 
acceptable  as  frequencies  (or  probabilities) 
is  that  each  value  of  {f.}  must  be  non- 
negative.   That  is, 


1  >  p  >  p  >  0 
—  i  —  2  — 

l-2p  +p  >  0. 
1   2  - 


(*0 


The  second  inequality,  above,  requires  that 
P  >  2p  -1, 

2  —    1 

which  is  an  additional  limitation  on  p  when 

2 

P  >  .5. 

i  — 

Looking  at  the  frequencies  in  the  table, 
one  notices  that 

f  +f  =     P   , 

12        .1 

which  was  the  f  for  the  2-month  pattern 

i 


and 


f  +f 

3   4 


1-P  , 

1 


which  was  the  f  for  the  2-month  pattern. 
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Another  feature  to  be  observed  is  that 
f  =  f  ;  these  frequencies  are  associated  with 

2     3 

the  symmetric  patterns  (x  x  y)  and  (x  y  y) , 
respectively.  This  symmetry  indicates  that 
months  1  and  3  could  be  interchanged,  with  no 
change  in  the  frequencies.   In  other  words, 
copying  pattern  (0  l)  has  the  same  frequency 
as  pattern  (l  0).  This  fact  will  be  of  assis- 
tance in  the  construction  of  frequencies  for 
more  than  3  months. 

FOUR  MONTHS 

For  h   months  there  are  eight  copying 
patterns,  corresponding  to  all  possible  permu- 
tations of  the  digits  (0,l)  in  sets  of  three. 
However,  the  number  of  equations  giving  rela- 
tionships between  the  variables  is  not  eight, 
but  seven. 

We  have: 

(a)   From  the  necessary  relationship  with 
the  3  month  pattern: 
P 


f  +  f 

1      2 

f  +  f 

3      4 

f   +  f 
5     6 

f   +  f 
7      8 


p   -  p 
1      2 


(5) 


P  -  P 

1      2 


=   1 


2p  +  P 
i    : 


(b)  From  symmetry  of  forms: 
f   =  f 

2        5 

(Copying  rule  110  ~  Oil) 

f   =  f 

4       7 

( Copying  rule  10  0  »  0  0  l) 


(6) 


(c)   New  information  from  Pattern  No.  1, 
in  the  form 


1         3 

Using  these  equations,  we  find  that  f  ,  f  , 

1    2 

f  and  f  are  uniquely  determined;  however, 

5        6 

there  is  some  freedom  in  the  choice  of  the 
other  four  frequencies.   If  we  take 


(7) 


=  X, 


we  find 


=      p      -    p      -   X 

1               2 

^ 

=     X 

=     1   -   2p     +   p 

1              2 

-  X   . 

J 

(8) 


Here  X  may  have  any  value  between  0  and  the 
smaller  of 


and 


1  -  2p  +  p  . 

1      2 


We  choose  X  =  0.  There  are  three  desirable 
consequences  of  this  choice:   two  of  the 
frequencies  are  then  equal  to  zero  (f  and  f  ), 

4        7 

so  that  we  have  the  minimum  possible  number 
of  non-zero  frequencies;  the  patterns  toward 
the  top  of  the  list  have  the  largest  allowable 
frequencies  (see  diagram  below);  and  the 
limitations  on  the  correlations  are  held  to  a 
minimum. 

Branch  diagram  No.  1  indicates  the 
development  of  the  sample  patterns  and  their 
corresponding  frequencies. 
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Branch  Diagram  No.  1 


(1) 


Four  Month  Copying  Frequency 


Month 


(2) 


(3) 


(4) 


P1  +  P2 


1  -  2P\  +  Pl 


Copying 
rule 


111 

no 
101 

100 
011 

010 
001 

000 


Frequency 


1  =  P3 

2  =  P2"P3 

3  =  pi  -  P2 

4=0 

5  =  P2-P3 

6  =  Pi  -  2P2  +  P3 
7=° 

8=l-2p1  +  p2 


REMARKS 


(l)  With  the  frequencies  in  the  diagram 
above ,  we  find  that,  in  order  that 
none  of  the  frequencies  be  negative, 
we  must  have 


1  >  p  >p  >p  >0 

—  1  —     2  ~     3  — 

1-p  >  p  -p  >  p  -p  >  0 

1  ~   1    2    ~        2        3    ~ 


(9) 


(2)  If  we  had  made  X  as  large  as  possible 

(X  =  p  -p  )  we  should  have  had 

1  2 

f  =      l-3p  +  2p  , 

8  12 

and  a  further  restriction  on  p 

2 
would  be  necessary: 

p  >  3P  -1. 
2  —        1 

Moreover,  with  this  choice  of  X,  only 

f  would  be  equal  to  zero,  and  there 
3 

would  be  seven  patterns  with  non-zero 
frequencies. 

(3)  A  choice  of  X  between  0  and  p  -p 

1  2 
would  have  led  to  eight  non-zero 

frequencies,  and  to  undesirable 

restrictions  on  the  correlations. 


SEVEN  MONTHS 

The  principles  we  are  using  in  developing 
patterns  are 

(1)  Copying  can  be  done  only  from  the 
preceding  month 

(2)  The  frequencies  in  any  month  must 
add,  in  pairs,  to  the  frequencies 
of  the  preceding  month 

(j)  The  number  of  patterns  with  non-zero 
frequencies  is  to  be  a  minimum. 
With  these  rules  it  is  possible  to  determine, 
successively,  the  patterns  and  frequencies  for 
any  number  of  months.   Branch  diagram  Mo.  2 
shows  the  copying  frequencies  over  a  7-month 
period. 

The  frequencies  must  all  be  non-negative, 
so  we  must  have 


1  >  p  >p  >p  >p  >p  >p  >0 

—   1—  2   ~       3   —       4   —       5  —   6_ 

1-P   >  P  -P    >  P  -P    >  p  -P   >  P  -P 
1  ~~   1   2—   2   3—   3   4  —   4   5 

>  p  -p  >  0 

—   5   6  — 


(10) 
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REMARKS 

As  will  be  seen  in  branch  diagram  No. 2  these 
principles  lead  to  only  l6  distinct  types  of 
7-month  patterns .  All  but  one  of  the  pattern 
types  have  some  copying,  from  at  least  1  month 
to  the  next.   For  a  normal  population—  we  may 
disguise  the  obvious  copying  by  adding  a 
number  of  the  7-month  patterns,  month  by  month 
(say  l6  of  them),  and  dividing  by  the  square 
root  of  the  number  added  ( k)   in  order  to 


preserve  the  magnitude  of  the  variance.  This 
averaging  also  has  the  feature  of  making  the 
final  numbers  more  nearly  normally  distri- 
buted, since  each  of  them  is  now  based  on 
12  x  l6  =  192  rectangularly  distributed 
pseudo-random  numbers. 


1  Such  an  averaging  procedure  would  not  be 
appropriate  if  the  distribution  being  con- 
structed were  not  normal,  for  example,  if  it 
were  a  rectangular  distribution. 


Branch  Diagram  No.  2 


Seven  Month  Copying  Frequency 


(1)  (2) 


(3) 


(4) 


(5) 


(6) 


(7) 


Copying 
rule 


Frequency 


'-PX  +  P2 


111111 

111110 
111101 
111011 
110111 

110110 
101111 

101110 
101101 

101010 
011111 

011110 
011101 

011011 
010101 
000000 


fl    =P3 

2  =P5"P6 

3  =P4~P5 
5    =P3~P4 

9  =P3-P4 

10  =  P2-2P3  +  P4 

17  =  P4-P5 

18  =  p3-2p4  +  p5 

19  =  p2-2p3  +  p4 

22  =  pi-2p2  +  p3 

33  =  P5"P6 

34  =  P4"2P5  +  P6 

35  =  P3~2P4  +  P5 
37  =  p2-2p3  +  p4 

43  =  Pl-2P2  +  P3 


64 


1- 


2Pl  +  P2 
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APPUCATION--CPS  ESTIMATE  OF  UNEMPLOYMENT 

The  problem  of  determining  whether  a 
change  over  time  is  significant,  for  a  char- 
acteristic which  is  estimated  from  a  sample, 
can  be  examined  by  creating,  by  a  Monte  Carlo 
procedure,  a  multivariate  population  which 
has  approximately  the  same  distribution  as  the 
distribution  of  the  estimates--in  particular, 
which  has  the  same  expected  values,  variances, 
and  covariances,  and  the  same  higher  moments. 

For  example,  the  monthly  estimate  of 
unemployment  from  the  Current  Population 
Survey  (CPS)  may  be  considered  to  be  approxi- 
mately the  mean  of  a  large  sample,  and  hence 
to  be  approximately  normally  distributed. 

Over  a  period  of  time,  such  as  7  months, 
one  may  wish  to  know  how  frequently  the  esti- 
mate will  rise  in  four  out  of  six  month-to- 
month  comparisons,  when  there  is,  in  fact,  no 
change;  or  how  frequently  the  change  from  the 

first  to  the  last  month  seems  to  be  as  much 

2/ 
as  1  percentage  point,—  when  in  fact  it  is 

only  6/ IP  of  a  percentage  point;  or  how  fre- 
quently the  estimate  will  fail  to  detect  a 
true  rise  of  1  percentage  point. 

We  assume  that  over  a  short  period  of  time 
(e.g.  up  to  7  months)  the  CPS  estimate  of 
unemployment  is  the  result  of  a  stationary 
process.  To  simulate  the  population  of  all 
possible  estimates,  we  may  add  to  the  CPS 
estimate,  at  each  month,  a  Random  Normal 
Deviate  (END),  whose  mean  is  zero;  whose  vari- 
ance is  the  variance  of  the  CPS  estimate  of 
unemployment;  and  such  that  the  correlations 
between  the  estimates  for  the  various  months 


will  be  those  of  the  CPS  sample  design.— 


/ 


See  paragraph  above  on  background. 

3  The  correlations  used  here  take  into 
account  the  Composite  Estimation  Procedure 
used  for  the  CPS  survey. 


Repetition  of  this  construction  leads  to 
as  many  different  estimates  of  the  6-month 
change  in  the  CPS  estimate  as  we  like.  From 
these  we  may  determine  the  frequency  of  in- 
crease for  any  particular  growth  pattern. 

Tables  A  and  B  of  the  appendix  show  the 
number  of  increases,  with  100  replications, 
for  four  different  growth  patterns.  In  the 
computation  the  following  correlations  have 
been  used,  from  the  composite  estimate  for 
unemployment: 


p   =   . 4205 

1 

P   =   . 2500 
2 


1397 


6 


=   .0712 

=  .0384 
=  . 02U7 


The  standard  deviation  is  taken  as 
a  =  .00lUl6,  the  same  for  each  month. 

From  table  A  we  see  that  the  number  of 
replications  having  increases  in  four  (or  six) 
successive  months  is  governed  by  the  growth 
pattern:   for  pattern  1  (with  no  change  month- 
to-month)  there  are  no  replications  with  six 
successive  increases,  and  only  two  patterns 
with  four  increases  in  the  first  5  months; 
for  pattern  h   (with  an  increase  of  .25  percent 
each  month)  there  is  an  increase  in  six  suc- 
cessive months  for  70  of  the  replications. 

Table  B  shows  the  number  of  replications 
having  an  absolute  increase  of  1  percentage 
point,  over  the  5-  and  .7-month  periods.  For 
growth  pattern  3>   which  has  an  increase  of  1 
percentage  point  over  7  months,  we  find  that 
four  replications  indicate  an  increase  of  1 
percentage  point  at  the  end  of  5  months, 
while  62  replications  show  an  increase  of  1 
percentage  point  at  the  end  of  7  months . 
These  are  absolute  increases;  if  we  allow  for 
sampling  error,  then  90  percent  of  the  repli- 
cations show  an  increase  of  1  percentage  point 
over  7  months,  at  the  lo  level;  at  2a  the 
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proportion  showing  increases  is  98  percent, 
and  all  100  replications  show  increases  at  Jo. 

The  joint  criterion  of  the  President's 
Economic  Report  is  equivalent  to  table  B,  for 
this  computation,  since  the  standard  deviation 
is  so  small  {.iklG   percent)  relative  to  the 
increase  over  the  whole  period  (l  percent). 

Consequently,  with  pattern  J,  we  might 
flag  an  increase  at  the  lo  level  after  5 
months  22  percent  of  the  time,  when  the  in- 
crease was  in  fact  smaller  than  that  required 
by  the  criterion;  on  the  other  hand  we  might 
fail  to  detect  a  true  increase  10  percent  of 
the  time,  at  the  3a  level. 

COMPUTER  PROGRAM 

An  1105  computer  program  has  been  written 
to  create  a  population  having  the  desired 
properties.  The  population  is  constructed  by 
a  Monte  Carlo  procedure. 

(a)  Over  a  7-month  period,  seven 
estimates  are  created  which  are 
normally  distributed. 

(b)  The  process  is  stationary:   that  is, 
the  variances  and  correlations  do 
not  depend  on  which  months  are 
being  compared. 

(c)  The  number  of  iterations  may  be  as 
large  as  is  desired. 

(d)  The  expected  value  may  be  specified, 
for  each  month. 

(e)  The  standard  deviation  (the  same  for 
all  months)  may  be  specified. 


For  example,  in  investigating  the  CPS  estimate 

of  unemployment,  we  may  call  the  expected 

value  in  the  i-th  month  P.:  a  suitable  set  of 

1 

values  of  P.  might  be  P  =5.5  percent, 
1  1 

P  =  5.6  percent,  P  =  5-7  percent,  ...  , 
2  3 

P  =6.1  percent.  A  corresponding  value  of 
7 

the  standard  deviations  is  . 001^16  =  .  1^16 
percent.  Using  this  set  of  parameters,  and 
correlations  for  the  composite  estimate,  we 
obtain  the  results  of  growth  pattern  2  in  the 
appendix. 

EXTENSIONS 

The  method  employed  may  be  used  under 
certain  more  general  circumstances. 

(l)  The  condition  of  stationarity  may  be 
relaxed  somewhat:   the  correlation 
between  successive  months  need  not 
be  always  the  same:   e.g.  we  may 
have  p  £   p     Similarly  p  £   p 

12      23  13      24 

is  a  possible  relationship  between 
correlations.  Upon  these  more  gen- 
eral correlations  there  will,  of 
course  be  some  limiting  relation- 
ships. For  a  pattern  over  k   months 
we  must  have  p  -p   =  p  -p   . 

12   13      34   24 

The  table  in  the  section  on  h   months 
would  be  replaced  by 
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Pattern 
number 

Month 
1  2  3  h 

Copying 
rule 

Frequency 

1 

X  X  X  X 

111 

f 
l 

=  P 

14 

2 

x  x  x  y 

110 

f 

2 

"  P  -P 

13    14 

3 

x  x  y  y 

10  1 

f 
3 

=   P   -P 

34   24 

k 

x  x  y  z 

10  0 

f 
4 

=   P   -p   -p   +p 

12   13   34   24 

=  0 

5 

x  y  y  y 

Oil 

f 

5 

=   p   -p 

24   14 

6 

x  y  y  z 

0  10 

f 
S 

=   p   -P   -p   +p 

23   13   24   14 

7 

x  y  z  z 

0  0  1 

f 
7 

=  -f   =0 

4 

8 

x  y  z  w 

0  0  0 

f 
8 

=   1-p   -p   +p 

23   34   24 

mi) 


(2) 


If  the  means  are  different  at  the 
different  months  (e.g.,  Ex  =  X, 
Ey  =  Y,  etc.)  a  modification  in  the 
"copying  rule"  is  required.   For  a 
2-month  period,  for  example,  we 
should  have 


Month 

Frequency 

1 

2 

X 
X 

(x-X)  +  Y 

y 

f   =  P 

1  l 

f   =  1-p 

2  1 

(3)  The  distribution  need  not  be 

multivariate  normal.  The  copying 
procedure  can  be  applied  to  any 
multivariate  population  which  has 
the  same  distribution  (with  possibly 
different  means,  but  the  same  higher 
moments  about  the  mean)  at  each 
month.   Consequently,  we  can  create 
a  multivariate  population  with  speci- 
fied correlations  over  time  for  any 
distribution  which  can  be  obtained 
(either  by  direct  or  by  approximate 
methods)  from  a  rectangular  popula- 
tion. Populations  having  such  dis- 
tributions such  as  Chi-square,  Gamma, 
or  Triangular  are  in  this  category. 

Other  distributions  can  also  be 
treated,  if  we  use  tables  to  convert 


from  one  probability  distribution 
to  another. 
(4)  The  most  general  stationary 

multivariate  normal  population  which 
can  be  generated  by  a  linear  trans- 
formation can  be  found  as  follows: 

(a)  Let  {x.}  i  =  1,  ...,  n  be  a 
random  normal  deviate  in  each 
of  n  months.  Suppose  the  {x.) 
are  independently  distributed 
N(0,1). 

(b)  Obtain  (y.)  by  the  transformation 


(y, 


1 
z 

3-i 


ij 


i  =  1, 


(12) 


(c)  Determine  the  matrix  (a. .)  so 
that 
cov.(y.,  y.+k)  =  pk, 

k  =  0,1,  ...,  n-1. 
This  insures  that  the 
transformation  will  induce  the 
correlations  which  are  desired. 

In  matrix  notation  we  have 

Y  =  A  •  X        (13) 
where  A  is  a  lower  triangular  matrix 
such  that 

A  »A  =  (a.  .), 

the  covariance  matrix. 
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The  only  restriction  is  that  the 
covariance  matrix  be  non-negative. 
The  matrix  equation  (lj)  can  be 
solved  by  well-known  methods.   We 
then  determine  (y.)  in  terms  of 
{x.};  the  copying  rules  are  replaced 
by  a  linear  transformation  of  the 
tx.}. 

For  a  3~month  period,  with 
correlations  p  and  p  ,    and  with 

1        2 

unit  variance  at  each  month,  the 
covariance  matrix  is 


l 

and  the  solution  is 
x 


y 


y 


p  X  +  /i- 

i  i 


p2* 

1   2 


P  (1-P  ) 


=   p  X  + 
2  1 


/ 


1-P' 


(IV) 


Here  the  only  condition  on  the 
correlations  is  that  pi .  .  i   be  >  0. 
This  is  more  general  than  the  condi- 
tions (See  equation  10)  of  the  copy- 
ing rule. 


REMARKS 

This  method  of  construction  is  applicable 
only  to  a  normal  distribution;  for  any  other 
distribution  the  linear  combination  of  a 
number  of  elements  tends  to  the  normal  distri- 
bution, as  the  number  of  elements  combined 
becomes  large. 


'(1-p2)  -  (p  -p2) 


1-P' 
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APPENDIX 

Number  of  increases,  with  100  replications,  for  growth  patterns  1,  2,  3  and  h. 


Growth  pattern 

12      3        h 

Estimate  at  first  month: 
Estimate  at  seventh  month... 

.055  .055   .055     .055 
o.ooo  .ooi   .00167   .0025 

.055   .061   .065      .070 

Table  A--Number  of  Replications  Having 
Increases  in  Period  Indicated 


Period 

No.  of 
increases 

Growth 

pattern 

1 

2 

3 

4 

U 

2 

19 

5^ 

77 

3 

27 

55 

ko 

23 

2 

hi 

2h 

6 

0 

1 

22 

2 

0 

0 

0 

2 

0 

0 

0 

Seven  months. . . . 

6 

0 

9 

^3 

70 

5 

k 

kh 

hk 

29 

h 

29 

15 

13 

1 

3  or  less 

67 

32 

0 

0 

Table  B--Number  of  Replications  Having  an 
Absolute  Increase  of  1  Percentage 
Point 


Period 

Number  of 
increases 

Growth  pattern 

1 

2 

3 

h 

Over  five  months: 

Absolute  increase. 

0 

0 

k 

5^ 

Within  limits  of. . 

la 

0 

0 

18 

28 

2a 

0 

11 

38 

17 

3a 

1 

35 

30 

1 

More  than 

3a 

99 

5^ 

10 

0 

Over  seven  months: 

Absolute  increase. 

0 

k 

62 

99 

Within  limits  of. . 

la 

0 

12 

28 

l 

2a 

0 

^ 

8 

0 

3a 

0 

30 

2 

0 

More  than 

3a 

100 

11 

0 

0 

NOTE  ON  RESTRICTED  ESTIMATES 

Benjamin  J.  Tepping 


Sometimes  it  is  known  that  a  population 
parameter  lies  in  a  specified  interval,  but 
the  estimator  being  used  has  positive  prob- 
ability outside  that  interval.   In  such  a 
case  it  seems  intuitively  reasonable  to  re- 
place an  unbiased  estimate  which  happens  to 
fall  outside  the  specified  interval  by  the 
nearer  endpoint  of  the  interval.   It  will  be 
shown  in  this  note  that  this  procedure  reduces 
the  mean  square  error. 

Let  u  be  a  statistic  such  that 
a  <  Eu  =  U  <  b. 
Define  a  statistic 

"a   if  u  <  a 
v     ={uifa<u<b 
b   if  u  >  b 


Then 


MSE(v) 


E(v-U)2 
=     P(u<a)   E[(v-U)2|u<a] 

+  P(a<u<b)   E[(v-U)2|a<u<b] 
+P(u>b)   E[(v-U)2|u>b]. 

Since  v  -   a  when  u  <  a,    and  v  =  b  when 

u  >  b,  and  since  v  =  u  when  a  <  u  <  b, 

we  have 

MSE(v)   =  P(u<a)(a-U)2 

+  P(a<u<b)   E[(u-U)2|a<u<b] 

+  P(u>b)(b-U)2. 

Now  add  and  substract  each  of  the  quantities 

P(u<a)  E[(u-U)2|u<a] 

P(u>b)  E[(u-U)2|u>b]. 

The  addition  of  these  quantities  to  the 

middle  term  makes  it  a2,  so  that 

u 

MSE(v)   =  P(u<a)|(a-U)2  -  E  [(u-U)2|  u<a  ] 

+P(u>b)J(b-U)2  -  E[(u-U)2|u>b] 


+  a' 


MSE(v)   =  P(u<a)J-  E[(u-a)2|u<a 
+  2(U-a)E[(u-a)|u<a] 
+  P(u>b)i-E[(u-b)2|u>b] 


-  2(b-U)E[(u-b)|u>b]l  +  a 


Since  each  term  in  the  braces  is  negative 
whenever 

P(u<a)  >  0,     P(u>b)  >  0, 
it  follows  that 

MSE(v)  <  a2 

u 

if  the  distribution  of  u  has  positive 
probabilities  outside  the  interval  (a,b). 

An  example  often  met  is  the  estimation  of 
variance  when  the  estimate  may  take  on  negative 
values.  Thus  replacing  negative  values  of  the 
estimate  by  0  reduces  the  mean  square  error. 

We  note  however  that  the  mean  of 
independent  estimates  v  does  not  necessarily 
have  a  smaller  mean  square  error  than  the  mean 
of  the  corresponding  estimates  u.   For  let 

1  2 

v  =  —  Z   v. . 
n  .    i 
1=1 


Then 
MSE(  v) 


E(v-U): 


=  E 


-Z(v.-U) 
n  .   i 
i 


=  -    eJz(v.-U)2+  £  (v.-U)U.-UH 
n2  li  X       i+j  1  J   J 

=  iJnMSE(v.)  +  n(n-l)  [E(v.-U)]' 
n2l 

=  iMSE(v.)  +  —  B2(v.) 
n      i'    n      i' 

where  B(v.)  denotes  the  bias  of  v..  Thus,  for 
i  i 

increasing  n,  MSE( v)  approaches  B2(v.)^  whereas 
the  variance  of  u  approaches  zero. 
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If  u  =  u  is  the  mean  of  a  Sample  of  n 
independent  observations,  the  variance  of  u 
may  be  estimated  by 

s2  =       -J—    £  (u.-u)2. 
u      n(n-l)  .1 

'    1=1 

If  Eu  =  U  and  it  is  known  that  U  is  non- 
negative,  we  may  define  the  restricted 
estimator 


v 


u  if  u  >  0 
.0  otherwise. 


We  may  then  wish  to  estimate  the  mean  square 
error  (MSE)  of  v.  From  the  preceding  expres- 
sion for  MSE(v),  taking  a  =  0  and  b  =  oo  ,  we 
may  write 
MSE(v)  =  P(u<0)U2  -  P(u<o)eJ(u-U)2Iu<oL  +  a2. 

We  may  consider  estimating  each  of  these  three 
terms  separately. 

Now  u^s2  is  an  unbiased  estimator  of  U2. 

u 

If  the  sample  size  n  is  sufficiently  large,  u 
is  approximately  normally  distributed,  which 
suggests  that  P(u<0)  be  estimated  by  the 
integral 

-u/s_ 

i    f  u 

/2i  A.   «p(-  it2)«. 

Similarly  the  second  term  may  be  estimated  by 
the  integral 

*o 


1    fu 

75s  J  (*-*) 


"\2 


exp 


r,\2 


(t-u) 
2s! 


dt 


which  is  obtained  by  substituting  the  estimate 

u  for  U  and  s_  for  a  in  the  normal  integral 
u 

with  parameters  U  and  a2.  The  integral  may 
be  written 
-u/s. 


s2 


u 


JL      X   exp(-  it2)dt  =-=£  e 


s/2it 


'2a 


2s2 


+  A  J  exp  -  | 

v/2it  -°°   L  d 


dt 


after  integration  by  parts.  The  third  term 

may  be  estimated  as  usual  by  s2.   Collecting 

the  three  terms,  we  have  the  estimator 

-u/s_ 


(u2-2s2) 


*J 


exp 


dt 


u  s_ 
u 


2n 


u2 

UJ 


+  s' 


which  can  be  evaluated  by  reference  to  a 
table  of  the  standardized  normal  distribution. 

The  result  can  be  extended  to  the  sum 
of  independently  distributed  variables,  each 
with  its  own  mean  and  variance. 

The  estimator  depends  on  the  assumptions 
that  u  is  distributed  nearly  normally,  and 
that  acceptable  estimates  of  the  normal  inte- 
gral and  its  integrand  are  obtained  by  sub- 
stituting sample  estimates  for  the  population 
mean  and  variance.   In  addition,  the  product 
P(u.  <  0)U2  is  estimated  by  the  product  of 
estimators  of  the  two  factors.  The  sensitiv- 
ity of  the  estimator  to  these  assumptions  has 
not  been  investigated.  However,  the  estima- 
tor was  applied  in  estimating  the  mean  square 
errors  of  estimates  of  a  large  number  of 
response  variances.  The  resulting  estimates 
had  reasonable  relationships  to  estimates  of 
the  variances  of  the  unbiased  estimates  of 
the  response  variances. 

The  principal  use  of  estimating  mean 
square  error  is  to  construct  confidence  in- 
tervals.  But  the  latter  can  be  done  directly 
in  a  simple  fashion.   For  example, suppose  it 
is  known  that  U  is  non-negative.   Consider  a 
number  a(0  <  a<l)  and  let  u  denote  the 
upper  a-confidence  limit: 


Prob(u  >U)   =  a. 


If  we  let 


a 


if  u  >  0 
a  = 

otherwise, 
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then,    since  U  >  0, 


Prob(v  XJ)      =     a. 


In  particular,  let  u  =  u,  the  mean  of  a 
sample  of  n,  where  n  is  sufficiently  large 
that  u  is  approximately  normally  distributed, 


Then,  from  the  usual  theory, 

u   =  u+s  x 
a  u  a 

where  x  may  be  read  from  a  table  of  the 
a 

normal  distribution: 


a 


^J 


exp(-  -  )dt. 


a 
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